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FAST INERTIAL DYNAMICS AND FISTA ALGORITHMS IN CONVEX OPTIMIZATION. 

PERTURBATION ASPECTS. 

HEDY ATTOUCH AND ZAKI CHBANI 


Abstract. In a Hilbert space setting 7/, we study the fast convergence properties as t —> +oo of the trajectories of the 
second-order differential equation 

x(t) + j£(t) + V$(x(t)) = g(t), 

where V4> is the gradient of a convex continuously differentiable function : 1~L —> R, a is a positive parameter, and 
g : [toj+oo[->• 'H is a ’’small” perturbation term. In this damped inertial system, the viscous damping coefficient y 
vanishes asymptotically, but not too rapidly. For a > 3, and f^°° t||g(£)||d£ < +oo, just assuming that argmin^ 7 ^ 0, 
we show that any trajectory of the above system satisfies the fast convergence property 

Q 

$ 000 ) -min<I> < —. 

For a > 3, we show that any trajectory converges weakly to a minimizer of <£, and we show the strong convergence 
property in various practical situations. This complements the results obtained by Su-Boyd- Candes, and Attouch- 
Peypouquet-Redont, in the unperturbed case g = 0. The parallel study of the time discretized version of this system 
provides new insight on the effect of errors, or perturbations on Nesterov’s type algorithms. We obtain fast convergence 
of the values, and convergence of the trajectories for a perturbed version of the variant of FISTA recently considered 
by Chambolle-Dossal, and Su-Boyd-Candes. 


1. Introduction 

Throughout the paper, 1~L is a real Hilbert space which is endowed with the scalar product (•, •), with ||a;|| 2 = (x, x) 
for any x G H. Let —> M be a convex differentiable function, whose gradient V$ is Lipschitz continuous on 

bounded sets. We suppose that S = argmin^ is nonempty. Let us give a a positive parameter. We are going to study 
the asymptotic behaviour (as t —> + 00 ) of the trajectories of the second-order differential equation 

(1) ( AVD ) a , g £ {t) + + V$ 0W) = g(t) 

and consider similar questions for the corresponding algorithms. Let us give some to > 0. The second-member 
g : [to, +oo [—> % is a perturbation term (integrable source term), such that g(t) is small for large t. Precisely, in our 
main result, Theorem 12. 11 assuming that a > 3, and f to °° t\\g(t)\\dt < + 00 , we show that any trajectory of JT} satisfies 
the fast convergence property 

(2) $0(t)) -nrin$ < 2. 

This extends the fast convergence of the values obtained by Su, Boyd and Candes in SB in the unperturbed case 
g = 0. In Theorem 13.II when a > 3, we show that any trajectory of (JT]) converges weakly to a minimizer of $, which 
extends the convergence result obtained by Attouch, Peypouquet, and Redont in [T5] in the case g = 0. 

This inertial system involves a viscous damping which is attached to the term It is an isotropic linear 

damping with a viscous parameter j which vanishes asymptotically, but not too rapidly. The asymptotic behaviour 
of the inertial gradient-like system 

(3) (AVD) x(t) + a(t)x(t) + V$(x(t)) = 0, 

with Asymptotic Vanishing Damping ((AVD) for short), has been studied by Cabot, Engler and Gaddat in [M] - 
[25]. As a main result, they proved that, under moderate decrease of a(-) to zero, i.e., a(t) — > 0 as t —> +00 with 
/ 0 °° a{t)dt = + 00 , then for any trajectory a:(-) of © 

(4) <J>(a;(f)) -A min<l>. 

As a striking property, for the specific choice a{t) = j. with a > 3 , for example when considering 

(5) x(t) + j%(t) + V$(x(f)) = 0, 
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it has been proved by Su, Boyd, and Candes in mi that the fast convergence property of the values m is satisfied by 
the trajectories of 10. In the same article [41], the authors show that 0 can be seen as a continuous version of the fast 
convergent method of Nesterov, see m-m -03]-[SI]. For the continuous dynamic, a related study concerning the case 
a(t) — 0 < 8 < 1 has been developed by Jendoubi and May in [ 25 ], with roughly speaking O(j^fw) convergence. 

The analysis developped in [ 25 ] does not contain the case aft) = j, where the introduction of an additional scaling, 
due to the coefficient a , requires a specific analysis. That’s our main concern in this paper. 

Our results provide new insight on the effect of perturbations or errors in the associated algorithms. They provide 
a guideline for the study of the preservation, under small perturbations, of the fast convergence property of the 
corresponding Nesterov type algorithms. Specifically we consider a perturbed version of the variant of FISTA recentely 
considered by Chambolle and Dossal [26], and Su, Boyd and Candes [41]. We obtain fast convergence of the values 
in the case a > 3, and convergence of the trajectories in the case a > 3. Convergence of the trajectories in the case 
a = 3, which corresponds to Nesterov algorithm, is still an open question. 


2. Fast Convergence of the values 


Let $ : TL —> R be a convex function, whose gradient Vd* is Lipschitz continuous on bounded sets. Let to > 0, 

r+oo 

a > 0, and g : [to, +oo[— > TL such that / ||g(t)||<if < +oo. We consider the second-order differential equation 

J tQ 

OL 

(6) ( AVD )a,g + -x(t ) + V$(x(f)) = g(t). 

From Cauchy-Lipschitz theorem, for any Cauchy data x(to) = xo £ TL, A (to) = X\ £ TL we immediately infer the 
existence and uniqueness of a local solution to ©• The global existence follows from the energy estimate proved in 
Proposition 12. 11 in the next paragraph. Throughout this paper we will use the following Gronwall-Bellman lemma, 
see m Lemme A.5] for a proof. 


Lemma 2.1. Let m £ L 1 (to,T-,S.) such that m> 0 a.e. on ]to,T[ and let c be a nonnegative constant. Suppose that 
w is a continuous function from [to,T] into M that satisfies, for all t £ [ to,T} 

i w 2 (t ) < ^c 2 + ^ m(T)w(T)dT. 


Then, for all t £ [io,2l 


|w(f| < c + 



m(r)dr. 


2.1. Energy estimates. The following estimates are obtained by considering the global energy of the system, and 
showing that it is a strict Lyapunov function. 


Proposition 2.1. Suppose a > 0, and 

( 7 ) 


r»+oo 


\\g(t)\\dt < Too 


'to 


Then, for any orbit x : [to, Too]— > TL of (AVD) Q 


sup ||x(t)|| < Too, 

t 


( 8 ) 

Precisely, for any t > to 
(9) 



1 


||±(f)|| 2 df < Too. 


||x(f)|| < ||x(t 0 )|| T V2 ($(x 0 ) 


— min $ 

u 


T 


HsO)! 


'to 


(10) £ ^||x(r)|| 2 dT < ^||a;(to)|| 2 T($(xo) - min$) T (||x(t 0 )|| + V2 (<F(x 0 ) - min$) T ||p||Li(t 0 ,oo)) IlsIUqio.oo)- 

Proof. Let us give some T > to- For to < t < T, let us define the energy function 

(11) W T (t) := ^||x(t)|| 2 T (®(x(t)) —min$) +J (x(r),ff(r))dr. 

Because of x continuous, and g integrable, the energy function Wt is well defined. After time derivation of Wt, and 
by using (AVD) a we obtain 

W T (t) := (x(t), x(t) T V$(x(t)) - g(t)) 

= (x(t),-jA(t)), 


W T (t) + j\\x(t)\\ 2 <0. 


that is 
( 12 ) 
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Hence Wt(-) is a decreasing function. In particular, Writ) < Wt{U))- i.e., 

7 }\\x(t )\\ 2 + ($(x(t))~ min$) + £ (x(t), g(r))d,T < |||i(t 0 )|| 2 + ($(®o)-mm$) + J* (x(r), g(r))dT. 

As a consequence, 

l\\x(t)\\ 2 < ^||i(t 0 )|| 2 + ($(x 0 ) - min$) + f ||a:(T)||||ff(T)||dT. 

^ J to 

Applying Gronwall-Bellman lemma 12.11 we obtain 

\\x{t)\\ < (||a;(to)|| 2 + 2 ($(xo)-min $^ 2 + f || 5 (r)||dr 

J to 

< \\x{t 0 )\\ + V2 (<f>(x 0 )- min+ f \\g(T)\\dT. 

J to 

This being true for arbitrary T > to, and to < t < T, we deduce that 

(13) ||i(t)|| < ||x(t 0 )|| + \/2($(x 0 )-inin$) + £ || 5 (r)||dr, 

which gives 0 and (01) • As a consequence, the function W (corresponding to T = +oo) 

(14) W(t) := i||x(t )|| 2 + ($(z(t)) -min$) + (x(t), g{r))dr, 

is well defined, and is minorized by 

nOO 

(!5) - ||£||z/»(t 0 ,+oo) / \\g(T)\\dT. 

Jto 

By (fT 2 |) we have 

(16) W(t) + j\\x(t)r<0. 

Integrating © from to t° t, and using ©, ©, we obtain 

/ OO i r oo 

-\\x(r)\\ 2 dT < -||i(t 0 )|| 2 + U>{x(t 0 )) - min$J + ||x|| L = (t0i+oo) / \\g{r)\\dT < +oo 

< 7>ll'i'(^o)|| 2 + ( < f>(a;(to)) - min$) + (j|i(t 0 )|| + V2 ($(x 0 ) - min$) + |M|i,i(to,+oo)) IlfflUqto.+oo), 

which gives 0 and (flOl) . □ 


2.2. The main result. Let us state our main result. 

r+oo 

Theorem 2.1. Suppose that a > 3, and / T||g(r)||dr < +oo. Then, for any orbit x : [to, +oo[—» TL of (AVD) a , 

J to 

we have the following fast convergence of the values: 

<fr(x(t)) — min<I> = O 


U 


1 

1?) ' 


Precisely 

(17) 
with 

Moreover 

(18) 


t 2 ($(x(t)) - inf $) < C + 2 
a - 1 H 


C \ 5 


a — 1 / a — 1 


' r llff<»IM' r / x\\g{r)\\dT, 


'to 


C= --t 2 ($(x 0 ) - inf $) + (a - l)||x 0 - x* , 

a — 1 n a — 1 


t0 x{t 0 )\\ 2 - 


sup ||x(i) - x* H-—-x(t)|| < 

t>t 0 a — L 


C 


a — 1J a — 1 


pOO 

/ T h( T )\\ dr < +°°- 

Jto 


Proof. The proof is an adaptation to our setting (with an integrable source term g) of the argument developed by 
Su-Boyd-Candes in [4j. Let us give some T > to, and x* £ S = argmiriT. For to < t < T, let us define the energy 
function 

(19) S a , g ,T{t) := —^-t 2 ($(x(t))-inf $) + (a - l)||x(t) - x* H- t —x(t)\\ 2 +2 f t{x(t) - x* + ■x(r), g{r))dT. 

a—1 u a — 1 J t a — 1 


Let us show that 


a _ 3 

£ a g r(t) + 2--t($(a:(t)) - min 4>) < 0. 

a — 1 n 
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Derivation of £ a ,g,r{;) gives 
4 


£ a ,g,T(.t) := - -t($(x{t)) - inf $) +- -t (V$(x(t)), x{t)) 

a — I H a — 1 

+ 2(a — 1 )(x(t) — x* + -^-Tj-x(t), x(t) + ——jx(t) + ——-a;(i)) — 2t{x(t) — x* + ^ ^ x(t),g(t)) 

= -T-t($(a;(i)) ~ inf $) + — ■ 
a — 1 M a — 1 

t ., . t (a. 


-f 2 (V<b(x(t)),x(t)} 


+ 2(a - l)(x(t) - x* + j-x(i), y (yi(t) + x(t) - s(t))). 


a — l a — 
Then use (AVD) a in this last expression to obtain 


( 20 ) 

( 21 ) 

( 22 ) 

By convexity of <f> 

Replacing in (1221) we obtain 

Equivalently 

(23) 


£<x,g,T{t) =-—^t{^{x{t)) - inf $) + — -t 2 (V$(x(t)), x{t)) 

— 2 t(x(t) — x* H- ^—x(t), V$(x(t))) 

a — 1 

= —-t-j-£($(x(f)) — inf $) — 2t{x(t) — x*,V$(x(t))). 


$(x*) > 4>(x(t)) + (x* — x(t), V<h(x(i))}. 
4 


£a,g,T(t ) +12 — 


a — 1 


t($(x(t)) -inf$) < 0 . 


£a,g,T(t) + 2 ° _ ^ f(4»(x(t)) - inf $) < 0. 


As a consequence, for a > 3, the function is nonincreasing. In particular, £ a ^ g (t) < £ a , g (to), which gives 


< 


a — 1 
Equivalently 


/ /*jt 

-t 2 (<h(x(t)) — inf $) + (a — l)||x(t) - x* H-—±(t)|| 2 + 2 / r(x(r) - x* H- x(r), g(r))d,T 

1 w a—1 J t a — 1 

t 0 2 ($(zo) - inf $) + (a - l)||x 0 - x* H-x(t 0 )|| 2 + 2 [ t(x(t) - x* 4- -x(t), g(r))dr. 

w a - 1 J tn a - 1 


(24) ——-t 2 (<b(x(t)) - inf 4>) + (a - l)||x(t) - x* H--x(t)|| 2 < C + 2 / r(x(r) — x* H- ^—-x(r),g(T))d,T, 

a — 1 H a —1 j t a — 1 


with 

From (Ell) we infer 
1 


C = —■t 0 2 (4-(x 0 ) - inf $) + (a - l)||x 0 - x* + —x(t 0 )|| 2 - 
a — 1 H a — 1 


(25) 


x(t)-x* +-- 

2 a — 1 


i(f)ll 2 < 


C 


2 (a-l) Oi-, JtQ 
Applying once more Gronwall-Bellman lemma 12.11 we obtain 


- f ||x(r)-x* + —^-i(r)||||r 5 (r)||dT. 

1 Jt a a - 1 


(26) 


G 


x{t)-x* + —— -x(t)W < -- +- - r|| 5 (r)||d- 

a - 1 \a — 1 J a - 1 


/■+00 

Since / i||<?(f)||<it < +oo, it follows that 

•I t 0 


C 


(27) sup ||x(£) - x* +--i(f)|| < -v +- - T\\g(r)\\dT < +oo 

t a -1 \ a ~ 1 / «~ 1 ./t 0 

Returning to EH) , we conclude that 

(28) 


——— t 2 (<E>(x(i)) — inf 4>) < C + 2 
a - 1 « 


C 


1 


a — 1 / a — 1 


oo \ r>oo 

r llff(' r )IM' r / ' r llff( T )llrf r - 


'to 


□ 
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Remark 2.1. As a consequence the energy function 

q f /■+ 00 r 

(29) £ ag (t):= - -f 2 ($(:r(f))-inf $) + (a - l)||x(f) - x* H -— -x(£)|| 2 + 2 / t(x(t)-x*-|--x(t), g(r))dT. 

a—1 u a—1 J t a—1 


is well defined, and is a Lyapunov function for the dynamical system (AVD) 




3. Convergence of trajectories 

In the case a > 3, provided that the second member g(t) is sufficiently small for large t, we are going to show the 
convergence of the trajectories of the system 


(AVD) C 


x(t) + —x{t) + V$(x(i)) = g(t). 


3.1. Main statement, and preliminary results. The following convergence result is an extension to the perturbed 
case (with a source term g) of the convergence result obtained by Attouch-Peypouquet-Redont in [T~5] . 

Theorem 3.1. Let <!> : TL — > R a convex continuously differentiable function such that S = argmin<!> is nonempty. 

r+oo 

Suppose that a > 3 and / t||< 7 (f)||df < +oo. Let to > 0, and x : [to, +oo [—¥ TL be a classical solution of (AVD) Q . 

v to 

Then, the following convergence properties hold: 

a) (weak convergence) There exists some x* € argmin<h such that 

(30) x(t) —^ x* weakly as t —»• +oo. 

b) (fast convergence) There exists a positive constant C such that 


(31) 

(32) 

c) (energy estimate) 

(33) 

(34) 

and hence 

(35) 


'to 


c 

$(x(f)) - min$ < 


^$(x(t)) — inf $ ) dt < +oo. 


t||x(t)|| 2 (it < +oo 


Jt 0 

I x(t 


C 

< — 

~ t 


lim ||x(f)|| = 0. 


t—> OO 


In order to analyze the convergence properties of the trajectories of system m, we will use the Opial’s lemma [35| 
that we recall in its continuous form; see also [22j . who initiated the use of this argument to analyze the asymptotic 
convergence of nonlinear contraction semigroups in Hilbert spaces. 

Lemma 3.1. Let S be a non empty subset of TL and x : [0,+oo [—> TL a map. Assume that 
(i ) for every z € S, lim ||x(f) — z || exists ; 

t —»+oo 


Then 


( ii ) every weak sequential cluster point of the map x belongs to S. 


w — lim x{t) = Xqo exists, for some element Xoo € S. 

£—>■+oo 


We also need the following result concerning the integration of a first-order nonautonomous differential inequation, 

see [T5| . 

Lemma 3.2. Suppose that 5 > 0, and let w : [<5, +oo [—> R be a continuously differentiable function that satisfies the 
following differential inequality 


(36) 


w(t) + ~j w (f) < &(£), 


for some a > 1, and some nonnegative function k : [<5, +oo [—> R such that 1 1 —> tk(t) 6 L x (5, +oo). Then 
(37) w + € L 1 (5, +oo). 
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3.2. Proof of the convergence results. 

Proof. Step 1. Let us return to the decrease property (|23l) which is satisfied by the Lyapunov function £ a , g - 

£ a git) + 2 —— jt($(x(t)) - inf <f>) < 0 . 
y a -1 H 

By integration of this inequality, we obtain 

£ a , g {t) + 2 ——j [ t ( 3 >( x ( t )) - inf $)dr < £ a , g (t 0 )- 
a ~ 1 Jto n 

By definition of £ a , g , and neglecting its nonnegative terms, we infer 


Hence 


By m , we have 


As a consequence 
„a — 3 


Since a > 3, we deduce that 
(38) 

Step 2. Let us show that 


/•-f-OO _ q pi 

/ t{x{t) — x* -\ - -±(T),g(T))dT + 2 - - t($(x(t)) - inf $)dr < £ a g (t 0 ). 

Jt a — 1 a - 1 J t0 u 

q pt /* + 00 

—j - t(<S>(x(t)) -inf$)dr < £ a , g {t 0 ) +2 / ||x(r) - x* + —x(r)||||r 5 (r)||)dr. 

~ 1 Jt „ H Jt 0 a - 1 

sup ||x(t) — x* H--—x(f)|| < +oo. 

t a -l 

o ft + f+OO 

- t($(x(t)) - inf $)dr < £ a , g (t 0 ) + 2sup ||x(f) - x* H- -x(t )|| / ||r 3 (r)||)dr. 

1 Jto H t a-1 J to 

r+oo 

/ t($(x(t)) — inf $)dr < +oo. 

Jto n 


/ t\\x{t)\\ 2 dt < +oo. 

J tQ 

To that end, we use the energy estimate which is obtained by taking the scalar product of © by t 2 x{t): 

(39) t 2 (x(t), x(t)) + at\\x(t)\\ 2 + f 2 (V<f>(x(t)), x(t)) = t 2 (g(t), x(t)). 

By the classical derivation chain rule, and Cauchy-Schwarz inequality, we obtain 

(40) 

After integration by parts 
f 2 '- 2 

2 


^t 2 ^\\x{i)\\ 2 +at\\x(t)\\ 2 +t 2 ^ix(t) < ||ig(t)||||fa:( 


II^WII 2 - %-Mto)\\ 2 ~ f s\\x(s)\\ 2 ds + a f s||i(s)|| 2 ds 
Z Jto Jto 

+ f 2 (<l>(x(f)) — inf 4>) — t 0 2 {§{x(to)) — inf $) — 2 f s(4>(x(s)) — inf 4>)ds < f ||sp(s)||||sx(s)||ds. 
H u Jto n Jt 0 

As a consequence, for some constant C > 0, depending only on the Cauchy data, 

(41) id|£(t )|| 2 + (a- 1) f s\\x{s)\\ 2 ds <C+ 2 [ s($(x(s)) - inf <5>)ds + f |Ms)|| ||sx(s)||ds. 

Z Jto J to H Jto 


By (1551) we have J)°° s($(x(s)) — inf-^ <h)ds < +oo. Moreover a > 1. As a consequence, from (TUT) we deduce that, for 
some other constant C 

(42) 


1 


£\\tx{t)\\ 2 < C + j ||sp(s)||||sx(s)||ds. 


Jto 

Applying Gronwall-Bellman lemma [2Jl we obtain 

\\tx(t)\\<V2C+ [ ||sff(s)||ds. 

f + OO 

'to 

sup |||| < +oo. 

t 


'to 


/*+oo 

Since / £||g(£)||d£ < +oo, we infer 

Jto 


( 43 ) 
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Returning to (1-TLl) . we deduce that 

f‘t f‘OC /»oo 

(44) (a — 1) / s||x(s)|| 2 ds < C + 2 / s(d>(x(s)) — inf 4>)ds + sup ||tx(t)|| / ||sg(s)||ds, 

Jt 0 Jt o n t J to 

which gives 

/»00 

t||x(t)|| 2 dt < +oo. 


Moreover, combining (EH), 

with (HTil) . we deduce that 

(45) 

i.e., all the orbits are bounded. 


*0 


sup ||x(t) — x* H- —rx(t)\\ < +oo, 

t a— 1 


sup ||x(t)|| < +oo, 

t 


Step 3. Our proof of the weak convergence property of the orbits of (AVD) Qg relies on Opial’s lemma. Given 
x* £ argmindy let us define h : [ 0 , +oo [—> R+ by 

(46) h(t) = ^\\x{t)-x*\\ 2 . 

By the classical derivation chain rule 

(47) h(t) = (x(t) - x*,x(t)}, 

(48) h(t) = (x(t) - x*,x(t)) + ||x(t)|| 2 . 

Combining these two equations, and using 0 we obtain 

(49) h(t) + jh(t) = \\x(t)\\ 2 + (x(t) - x*,x(t) + 

(50) = \\x(t)\\ 2 + (x(t) - x*, -V$(*(t)) + g(t)). 

By monotonicity of Vd> and V$(x*) = 0 

(51) (x(t)-x*,-V$(x(t))) <0. 

By (l49|l and (|5ll) we infer 

(52) h(t) + jh(t) < ||x(t )|| 2 + ||x(t) - x*|||| 3 ( t)||. 

Equivalently 

(53) h(t) + —h(t) < k(t ), 
with 

k(t) := ||x(t )|| 2 + ||x(t)-x*|||| 5 (t)||. 

By (|45|) the orbit is bounded. Hence, for some constant C > 0 

k(t)<\\x(t)\\ 2 +C\\gm- 

By assumption f+°° t\\g(t)\\dt < +oo, and by (l33ll t\\x(t)\\ 2 dt < +oo. Hence t i—>■ tk(t) £ L 1 (to,+oo). Applying 

Lemma l3?2l with w(t) = h(t), we deduce that w + £ L 1 (to, +oo). Equivalently h + {t) £ L 1 (to, +oo), which implies that 
the limit of h(t) exists, as t —> +oo. This proves item i) of the Opial’s lemma. We complete the proof by observing 
that item ii ) is satisfied too. Indeed, since 4>(x(t)) converges to inf $, we have that every weak sequential cluster point 
of x(-) is a minimizer of d>. □ 


3.3. Strong convergence results. Since the work of J.B. Baillon, we know that without additional assumptions, 
the trajectories of the gradient systems may not converge strongly. Let’s examine some practical interest situations 
where strong convergence of the trajectories of (AVD) q is satisfied. 

Strong convergence under mt(argmin$) ^ 0. 

We will need the following result, see ([15], Lemma 5.4). 

Lemma 3.3. Suppose that S > 0, and let f : [5; +oo [—> H be a continuous function that satisfies f £ L 1 (5; +oo ;7i). 
Suppose that a > 1 and x : [<5; +oo [—> l~l is a classical solution of 

tx(t) + ax{t) = /(f). 

Then, x(t) converges strongly inTL as t —> oo. 
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/■+oo 

Theorem 3.2. Suppose that a > 3, / tg{t)dt < +oo, and $ satisfises mf (argmin <I>) ^ 0. Lei x(-) be a classical 

Jt 0 

global solution of equation W- Then, there exists some x* £ argmin $ such that x{t) —> x* strongly as t -A +oo. 

Proof. We follow the same approach as that proposed in ns Theorem 3.1]. We first observe that the assumption 
wi(argmin$) ^ 0 implies the existence of some z &TL and p > 0 such that, for all x £ TL, (V$(x),x—z) > p||V ( t>(a;)||. 
In particular, for all t > to 

{V$(x(t)),x(t) -z)> p||V$(x(f))||. 

Combining this inequality with (l22l) (that we recall below) 

£a,g(t) = a ^ ^ t(^(x(t)) - inf $) - 2 t(x(t) - z,V®(x{t))) 

we obtain 

(54) £a,g(t) + 2pt||V$(x(t))|| < -^-jt(®(x(t)) - inf $). 

Let us return to (1231) . which after integration, and using a > 3, gives 

/•OO 

/ f($(x(i)) — inf $)di < +oo. 

Jt 0 u 

As a consequence, by integrating (l54l) . we deduce that 

I 

i||V$(x(i))||df < +oo. 


/ 
J tn 


By setting f(t) = tg(t) — tV$(x(t)), we can rewrite equation ([I]) as 

tx(t) + ax(t) = /(f). 

Since all assumptions of Lemma 13.31 are satisfied, we can affirm that x(t) converges strongly to some x* £ TL. Recalling 
that 4>(a;(t)) —> inf^ $ and that <f> is continuous, we obtain x* £ argmin $. □ 

Strong convergence in the case of an even function. 

Recall that d> : TL —> R is an even function if <4>(— x) = <E>(x) for all x £ TL. In this case, 0 £ argmin^4>. 


r+oo 

Theorem 3.3. Suppose that a > 3, / tg(t)dt < +oo, and is an even function. Let x(-) be a classical global 

J to 

solution of equation m■ Then, there exists some x £ argmin such that x(t) converges strongly to x as t -A +oo. 
Proof. Set, for to < t < r, 


By derivating twice, we obtain 
and 


y{r) = ||x(r)|| 2 - ||x(r)|| 2 - i|| x(r) - x(r)|| 2 . 


y( T ) = (z(t),x(t) + x(r)) 


(55) 


2/0) = ||x(t)|| 2 + (x(t),x(t) + x{r)). 

From these two equations and (1), we deduce that 

2 / 0 ) + 72 / 0 ) = \\x(t )\\ 2 + (x{t) + ^x(t),x(t) +x(r)) 

= ||x(t)|| 2 + (g(r) - V$(x(t)),x(t) + x(r)). 

Let us now consider the energy function, W(t) = A||±(t)|| 2 + $(x(r)) + f^°{x(t),g(t)dt). We have jLW{r) = 
— ^||x(r)|| 2 , and therefore W is a nonincreasing function. As a consequence, W{t) > W(r), which equivalently gives 

i||i(r)|| 2 + $(x(r)) > |||x(r)|| 2 + $(x(r)) - J {x(t),g(t))dt. 

Using the convex differential inequality <f>(— x(r)) > <3?(x(t)) — (V$(x(t)),x(t) + x(r)), and the even property of $, 
<&(x(r)) = 4>(— x(r)), we deduce that 

7 jlOO)l| 2 > -(V$(x(t)),x(t) + x(r)) - J (x(t),g(t))dt. 

Returning to (1551) . we finally obtain 

y{T) + ^y(T)<^\\x(T)\\ 2 + {g( T ),x(T)+x(r))+ J (x(t),g(t))dt. 
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Let us recall that, by Theorem 13.II the trajectory x(-) is converging weakly, and hence bounded. Moreover, by (1531) , 
we have ||x(t)|| < + Hence, for some constant C 

n 3 f +oc 1 

( 56 ) y(r) +-y(r) < k{r) := -\\x(t)\\ 2 + C\\g{r)\\ + C J -\\g(t)\\dt. 

Let us observe that the function k does not depend on r. Let us verify that r i —> rk(r) £ L 1 (to,+oo). By Theorem 
13.11 we have t\\x(t)\\ 2 dt < +oo. By assumption, J to °° tg(t)dt < +oo. Moreover, by Fubini theorem 

/ OO /»00 2 ^ r OO 

t J jh(t)\\dtdT < - t\\g{t)\\dt < +oo. 

By integration of (1561) . by a similar argument as in Lemma 13.21 we obtain 


(57) y(r) < ^ +-^ [ u a k(u)du, 

T T Jto 

where C = i 0 “||^(*o) II IMloo- Set 

K(t) := ~ [ u a k(u)du. 

r “ r “ Jto 

By using Fubini theorem once more, and the fact that r rk(r) £ L 1 (t 0 ,+oo), we deduce that K £ L 1 (t 0 ,+oo). 
Integrating y{r) < K(r) from t to r, we obtain 

\\\x{t) - x{r)|| 2 < ||x(t )|| 2 - ||x(r )|| 2 + £ K{r)dT. 

Since $ is even, we have 0 £ argmin<f>. Hence lim t _ >+00 ||x(i )|| 2 exists (see the proof of Theorem l3.ll) . As a consequence, 
x(t) has the Cauchy property as t —> +oo, and hence converges. 

□ 


4. The case argmin4> = 0. 

r+oo 

Theorem 4.1. Suppose a > 0, ||p(t)||dt < +oo, and inf $ > —oo. Then, for any orbit x : [toj+oo[—t TL of 

J to 

(AVD) q g , the following minimizing property holds 

lim 4 >(x(t)) = inf 4>. 
t->+oo H 


We will use the following lemma, see pL5j. 

Lemma 4.1. Take 8 > 0, and let f £ L 1 (<5, +oo) be nonnegative. Consider a nondecreasing continuous function 
if : (5, +oo) -4 (0, Too) such that lim if(t) = +oo. Then, 

£—>■+00 

lim -777 [ if(s)f(s)ds = 0 . 
t_v +°° ip\t) Jg 

Proof of Theorem 14.11 Let us first return to the proof of the energy estimates in Proposition 12.11 Replacing inf $ 
by min 4> in the expression of the energy function, we obtain by the same argument 

(58) sup ||x(i)|| < +oo, 

t 

f +0 ° 1 

(59) / -||x(i)|| 2 <it < +oo. 

Jt 0 t 

Consider the function h(t) = |||x(i) — z\\ 2 , where this time, z is an arbitrary element of TL. We can easily verify that 

•• fy • 

h(t) + -h{t) = ||x(t )|| 2 - (V4>(x(t)),x(t) - z) + (g(t),x(t) - z). 

By convexity of $, we obtain 

(60) h(t) + jh(t) + $(x(t)) - $(z) < ||x (<)|| 2 + (g(t),x(t) - z). 

Consider the energy function 

= \ 

By classical derivation rules, and m 

^jW{t) = (x(i),x(t) + V$(x(t))-ff(t)) 

= -|||x(t)|| 2 < 0 . 


||x(t )|| 2 + 4>(x(t))-inf4'+ / (x{s),g(s))ds. 
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As a consequence, W is a nonincreasing function. Moreover, W is minorized by — ||x||l°° f t °° ||g(s)||ds. Hence, there 
exists some Woo G K such that W{t) —> Woo as t — > oo. Let us take advantage of this property, and reformulate (1M1) 
with the help of W: 


(61) 


o poo 

h{t) + -h{t) + W(t) + inf$ - $(z) < -\\x{t)\\ 2 + (g(t),x(t) - z) + / (x(s),g(s))ds. 


For every t > 0, W(t) > W,oo- Setting = W ^ + inf 4> — $(z), we obtain 


B x <-\\xm 2 + Ht)\\\\x(t)-z\\ + \\x\\ La 


\\g{s)\\ds-^j t {t<*h{t)). 


Multiplying this last equation by and integrating between two reals 0 < to < 9, we get 
(62) 


Boo ln( j-) < | / y||x(f)|| 2 df+ 


IlffWIIIkW - A\ 


to 


dt + ||x||j 


'to 


'to 


'to 


|| 3 (s)||ds dt- 


'to 


1 d 
t a+1 dt 


( t a h{t))dt. 


Let us estimate the integrals in the second member of (l62l) : 

(1) By (SID, f£°° iPWII 2 dt < +oo. 

(2) Exploiting the relation ||x(i) — z\\ < ||x(fo) — ~|| + / t * ||x(s)||ds, we obtain 


\\ 9 mMt)-z\\ 


dt < 


J t 0 

(3) After integration by parts 
r 0 


ll^o ~z|| 
to 


+ F 


r»+oo 


\\g(t)\\dt < +oo. 


'to 


'to 


U 


) p OO p oo 

dt = In9 / ||ff(s)||ds — lnf 0 / ||ff(s)||ds + 

J 0 Jto * 


lnt dt. 


'to 


(4) Set 7 = t^ T -^{t°‘h{t))dt. By integrating by parts twice 


7 = 


7 HO + ( a + 1) / to j?h(t)dt 


to 


= C+ \h{9) + 2plh(0) + 2(1 + a) £fc(i)di. 

Since h > 0, we have —7 < — C — jjh(9). Then notice that \h(0)\ = \{±{0),x{9) — z)\ < ||x||lo°(||x(0) — z\\ + 
0\\i\\ l-). 

Collecting the above results, we deduce from (Rl? 1) that 

,-e 


(63) 


f roo 

^ooln(—) < C-\-\n6 / || 5 '(s)||ds + ||*||j 

to Jo 


lnt dt. 


to 


Dividing by ln(-L), and letting 9 +oo, thanks to Lemma FTTI with %jj(t) = lnt, we conclude that Boo < 0. Equivalently, 
for every z € H, Woo < $(z) — inf <h, which leads to Woo < 0 . 

On the other hand, it is easy to see that W(t) > $(x(t)) — inf $ — ||x||l~ f t + °° g{s)ds. Passing to the limit, as 
t —► +oo, we deduce that 

0 > Woo > limsup 4>(x(f)) — inf $. 

Since we always have inf $ < liminf $(x(t)), we conclude that limt^+oo <f>(x(t)) = inf <h. □ 

Remark 4.1. In M , in the unperturbed case g = 0 , it has been observed that, when argmin'L = 0 , the fast convergence 
property of the values, as given in Theorem 12.11 may fail to be satisfied. A fortiori, without making additional 
assumption on the perturbation term, we also loose the fast convergence property in the perturbed case (take g = 0 !). 


5. From continuous to discrete dynamics and algorithms 

Time discretization of dissipative gradient-based dynamical systems leads naturally to algorithms, which, under 
appropriate assumptions, have similar convergence properties. This approach has been followed successfully in a 
variety of situations. For a general abstract discussion see m, 0 , and in the case or dynamics with inertial features 
see 0, a, i, na, m- To cover practical situations involving constraints and/or nonsmooth data, we need to 
broaden our scope. This leads us to consider the non-smooth structured convex minimization problem 

(64) min{$(x) + 'F(x) : x £ 'H} 

where 

• {+oo} is a convex lower semicontinuous proper function (which possibly takes the value +oo); 

• 'll : T~L — > R is a convex continuously differentiable function, whose gradient is Lipschitz continuous. 

The optimal solutions of (l64l) satisfy 


9$(x) + V'F(x) 9 0, 
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where <9<f> is the subdifferential of $ in the sense of convex analysis. In order to adapt our dynamic to this non-smooth 
situation, we will consider the corresponding differential inclusion 

(65) x(t ) + j x (t) + <94>(x(f)) + V4>(x(i)) 9 g(t). 

This dynamic is within the following framework 

(66) x(t) + a(t)x(t ) + d@(x(t)) 9 g(t ), 


where 0 : TL —> RU{+oo} is a convex lower semicontinuous proper function, and a(-) is a positive damping parameter. 

The detailed study of this differential inclusion goes far beyond the scope of the present article, see [TQ] for some 
results in the case of a fixed positive damping parameter, i.e., a(t) = 7 > 0 fixed, and g = 0. A formal analysis of 
this sytem shows that the Lyapunov analysis, which has been developed in the previous sections, still holds, as long 
as one does not use the Lipschitz continuity property of the gradient (cocoercivity property). This is based on the 
fact that the convexity (subdifferential) inequalites are still valid, as well as the (generalized) derivation chain rule, 
see [2D] , Thus, setting 0(x) = 4>(x) + ’P(x), we can reasonably assume that, for a > 3, and J+°° f||y(t)||cft < +00, for 
each trajectory of (1551) . there is rapid convergence of the values, 

C 

Q(x(t)) — min0 < —, 

and weak convergence of the trajectory to an optimal solution. 

Indeed, we are going to use these ideas as a guideline, and so introduce corresponding fast converging algorithms, 
making the link with Nesterov [8Tj-|34j. Beck-Teboulle [I9j, and so extending the recent works of Chambolle-Dossal 
[26] . Su-Boyd-Candes [41], Attouch-Peypouquet-Redont [14] to the perturbed case. As a basic ingredient of the 
discretization procedure, in order to preserve the fast convergence properties of the dynamical system (1651) . we are 
going to discretize it implicitely with respect to the nonsmooth function $, and explicitely with respect to the smooth 
function 4'. 

Taking a fixed time step size h > 0, and setting tk = kh , Xk = x(tk) the implicit/explicit finite difference scheme 
for (l65l) gives 

1 Ot 

(67) ^7(x fc+ 1 - 2 x k + Xfc_i) + TTo( x k ~ x k - 1) + i9$(x fc+ i) + V4'(y fc ) 9 g k , 

where yk is a linear combination of Xk and Xk- 1, that will be made precise further. After developing (1671) . we obtain 

(68) x k +i + h 2 d®(x k+ i) 9 x k + (l - (x fc - x k - 1) - h 2 V^(y k ) + h 2 g k . 

A natural choice for yk leading to a simple formulation of the algorithm (other choices are possible, offering new 
directions of research for the future) is 

(69) y k = x k + (l - ^ (x fe - Xfe_i). 

Using the classical proximal operator (equivalently, the resolvent operator of the maximal monotone operator 94>) 


(70) 


prox 7$ (x) = argmin €6W 


{*«) + ^iic-*ii 2 } 


(/ + 79$) 1 (x) 


and setting s = ft 2 , the algorithm can be written as 


(71) 


yk = x k + (l - f) (xk - x k ~ 1); 


[ x k+ i = prox s$ (y k - s(V4'(y fc ) - g k )). 

For practical purpose, and in order to fit with the existing litterature on the subject, it is convenient to work with the 
following equivalent formulation 


(72) 


( AVD Ug - al s° 


Vk = Xk + k+a-l ( X k - Xk- 1 ); 

x k +i = prox s4> (y k - s(V4'(y fc ) - g k )). 


Indeed, we have = 1 — fc+ “_ 1 . When a is an integer, up to the reindexation k 1 —> k +a — 1, we obtain the same 

sequences (xfc) and (yk)- For general a > 0, we can easily verify that the algorithm (AVD) a — algo is still associated 
with the dynamical system (1651) . 

This algorithm is within the scope of the proximal-based inertial algorithms 0 , es], m, and forward-backward 
methods. In the unperturbed case, g k = 0, it has been recently considered by Chambolle-Dossal [26], Su-Boyd-Candes 
sn. and Attouch-Peypouquet-Redont [14] . It enjoys fast convergence properties which are very similar to that of the 
continuous dynamic. 
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For a = 3, gk = 0, we recover the classical algorithm based on Nesterov and Giiler ideas, and developed by 
Beck-Teboulle (FISTA) 


(73) 


Uk = Xk + - x k - 1); 

Xk +1 = prox s$ (y k - sV'F(yfc)). 


An important question regarding the (FISTA) method, as described in (|73l) . is the convergence of sequences (x k ) 
and (yk)- Indeed, it is still an open question. A major interest to consider the broader context of (AVD) a — algo 
algorithms is that, for a > 3, these sequences converge, and they allow errors/perturbations, and using approximation 
methods. We will see that the proof of the convergence properties of (AVD) a — algo algorithms can be obtained in 
a parallel way with the convergence analysis in the continuous case in Theorem 13.11 


Theorem 5.1. Let {+oo} be a convex lower semicontinuous proper function, and : TL — > R a convex 

continuously differentiable function, whose gradient is L-Lipschitz continuous. Suppose that S = argmin(<f> + if) is 
nonempty. Suppose that a > 3, 0 < s < -t, and X)fceN&||5k|| < +oo. Let (x k ) be a sequence generated by the 
algorithm (AVD) Q — algo. Then, 

(<F + \H)(xfc) - min(<I> + \H) = O(r^). 

Precisely, 


(74) 


(<f> -f \I f)(xk) — min($ + '!')< 


C(a- 1) 
2s(k + a — 2) 2 ’ 


with C given by 


where 


C = g(0) + 2s 




2s 

a — 1 


OO 


c ?+ a - 

i =o 



G(0) = (a - 2 f (0(x„) - ©*) + (a - l)||j/o - *T- 

a — 1 


Proof. To simplify notations, we set 0 = $ + 'F, and take x* € argmin©, i.e., 0(x*) = inf 0. In a parallel way to 
the continuous case, our proof is based on proving that (£(k)) is a non-increasing sequence, where £(k) is the discrete 
version of the Lyapunov function £ a , g (t) (we shall justify further that it is well defined), and which is given by 

2 °° 

(75) £(k) := —— (k + a- 2) 2 (0(x fc ) - 0(x*) + (a - l)\\z k - x*\\ 2 + 2s (j + a - 1) (gj,z j+1 - x*) , 

® ^ j=k 


with 

(76) 


k + a — 1 

%k Z Vk 

a — 1 


a — 1 


x k - 


In the passage from the continuous to discrete, we recall that we must use the reindexing k H > k + a — 1. Note that 
£(k) is equal to the Lyapunov function considered by Su-Boyd-Candes in [44], Theorem 4.3], plus a perturbation term. 
Let us introduce the function T k : TL —> R which is defined by 


Vy e TL, ’F k(y ) := 'j'(y) - (gk, y) ■ 


We also set 


0 fc = $ + 'F fc . 


We have V4/fc(y) = V’F(y) — g k , and hence V*Ffc is still L-Lipschitz continuous. We can reformulate our algorithm 
with the help of *Ffc as follows 


(77) 


(AVD) Q>g ^ algo 


Vk = Xk + fc +a-l (Xk - x k - 1); 
Xk+1 = prox s$ (y k - sV^fe(yfe)). 


In order to analyze the convergence properties of the above algorithm, it is convenient to introduce the operator 
G s ,k : TL —> TL which is defined by, for all y £ TL, 


G s ,k(y) 


- (2/ - P rox s $ (y - sv^fc(y))). 


Equivalently, 


prox s$ (y - sV*F fe (y)) =y- sG Syk (y), 
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and the algorithm (ED can be formulated as 
(78) (AVD) Qg - algo 


Vk =Xk+ “ Zfc-l); 


Xfc+1 — Vk sG s,kijj}i) • 

The variable z k , which is defined in (1761) by z k = ~Uk — 7 T^T x k: will play an important role. It comes naturally 
into play as a discrete version of the term -+ix(t) + x(t) — x* which enters £ a _ g (t). Indeed, 


(79) 


k + a — 1 . . k + a — 1 k 

1 (Xfc-f-1 Xk) T Xk — z Xk-\- 1 ~ 

a — 1 a — 1 a—1 

= Zk+l 


Xk 


where the last equality comes from (1801) below. Let us examine the recursive relation satisfied by z k - We have 


%k +1 — 


Vk +1 


(80) 


(81) 


k + a 
a — 1 
k + a 


a — 1 
k + a — 1 
a — 1 
k + a — 1 


k + 1 
a — 1 


Xk +1 


Xk +1 + -r -; ( Xk+i - Xk)) - 


k + 1 
a — 1 


£fc-t-i 


Xk +1 


a — 1 




1 (2/fc %k 

a — 1 a — 1 


= 2k - 


a — 1 


(fc + a - 1) Gs^ivk)- 


We now use the classical formula in the proximal gradient (also called forward-backward) analysis (see [15], [25], [37] . 
ED : for any x, y € Ti 


(82) 


Q k (y - sG s , k (y)) < 0k(x) + {G s>k (y),y - x) - -||G ? S;fc (z/)|| 2 . 


Note that this formula is valid since s < j-, and V’Lfc is L-lipschitz continuous. Let us write successively this formula 
at y = yk and x = Xk , then at y = y k and x = x*. We obtain 

®k(jjk {]Jk)i Vk %k) ^||^s,fc(2/fc)|| 

Qk(yk - sG Stk (yk)) < 0fc(x*) + (G s ^{yk), Vk - x*) - |||G s ,fe( 2 /fe)|| 2 . 

Multiplying the first equation by fc , ^_ 1 , and the second by , then adding the two resulting equations, and using 

Xk+i = Vk~ sG Stk (yk ), we obtain 


(83) 

(84) 


0fc(x/c+1) A 


fc + a — 1 


0fc(x&) 


/"V - 1 c 

—--0fc(x*) — -z\\G s k(yk)\\ 2 

k + a — 1 2 


+ ( G St k(yk)i ■ 


k 


(Vk - x k ) 


a — 1 
k + a — 1 


(j/fc - X*) ) . 


fc + a — 1 

Let us rewrite the scalar product in (1M1) as follows: 

( 85 ) (G s k {y k ), 7—— - -r(Vk ~ x k ) + 77^ — r(2/fe — x*)\ = — l — (CsAyk), —^—{yk - x k ) + y k - x* 

\ k + a — 1 /c -b a — 1 / /c + a — 1 \ a — 1 


a — 1 
/c + a — 1 
a — 1 
k + a: — 1 


N A; + a — 1 k 
s,k{yk) ? r Vk T 

a — 1 a — 1 

(G s ,k(yk),z k - x*). 


x k - x 


Combining ([83 ]) - (|84|) with ([85]) . we obtain 

(86) 0fc(x fc+ i) < fc + ^_ 1 B fc (x fc ) + 1 6fc(x*) + (GsAVk), z k - x *) - S -\\G s AVkW 

In order to write (1861) in a recursive form, we use the relation (1811) satisfied by z k , which gives 

* * s 


Zk+l - X = Zk - X 


a — 1 


(fc + a - 1) G,, k (Vk). 


After developing 


\z k+ i - x*\\ 2 = 


\z k - X*\\ 2 - 2— (k + a- 1) (z k - x*,G s Ay k )) + - —(k + a- l) 2 ||G Sifc (i/fc)|| 2 , 
a — 1 (a — \) z 
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_-j\2 

and multiplying the above expression by 2s (fc+o—i) 2 > we obtain 

0 (II** - *T - ll%+i - ^ll 2 ) = T^-r (GsAVk),Zk x*) - f ||G s , fe (j/ fe )|| 2 . 

2s (fc + a — 1) k + a — i z 

Replacing this expression in (1861) . we obtain 

(87) e k (x k+1 ) < k Q k (x k ) + h a ~ X M x*) + - (a ~ J) ~ , (ll^fc - x*\\ 2 - ||* fc+1 - z*|| 2 ) • 

/c + a—i K + a — i 2s (k + a — 1) 

Equivalently 

(88) Q k (x k+1 ) - 9 k (x*) < k (9 k (x k ) - Q k (x*)) + o (11% - *T - ll%+i - **l| 2 ) ■ 

K + a — i 2s (k + a — 1) 

Returning to Q(y) = 9 k (y) + {g k ,y), we obtain 

(89) 0(* fc+1 ) - ©(**) < h k 1 (0(®fc) - 0(s*)) + ~ (||% - *T ^ ||%+i ^ x*f) 

kO i — i 2s [k + a — l) 

k 

+ {g k , x k+1 - x*) - —-- (pfe, x k - x*). 

k + a — 1 

After reduction 

k (a — 11 2 

(90) 0(®fc+i) - 0(x*) < —-- (0(x k ) &(x*)) + (||z fc - z*|| 2 - \\z k+1 - z*|| 2 ) 

k + a-1 2s (k + a — 1) 

+ (g k ,x k+ 1 - x k + —^- ~(x k - x*) ) . 

\ k + a —1 / 

Multiplying by {k + a — l) 2 , we obtain 

(91) 

——r (k + a - l) 2 (0(a;fc+i) - 9(x*)) < 2s k (k + a - 1) (9(x k ) - ®(x*)) + (a - 1) (|| z k - :r*|| 2 - \\z k +i - x* 
a — 1 a—1 




2 s 

(k + a 



a—1 



' a—1 


1 ) (g k ,x k +i — x k + 

, Ax k X 

/e + a — 1 

For a > 3 one can 

easily verify that 








k(k + 

a—1) 

< 

(k + a — 2) 2 . 


More precisely 








k (k + a — 1) = (k + 

e 

to 

k(a — 

3) 

— (a — 2) 2 < (k + a - 

- 2) 2 - k(a — 3). 


As a consequence, from (15T1) we deduce that 

(92) (k + a- l) 2 (0(a*+i) - 0(1*)) + 2 s^^k (9(x k ) - 0(x*)) < -^-r (k + a- 2) 2 (0(® fc ) - 0(a:*)) 

a — 1 a — 1 a — 1 

+ (a - 1) (|| z k - x*\\ 2 - llzfc+i - £*|| 2 ) + 2s (k + a- l) 2 (g k ,x k+1 - x k + —^—— ~(x k - x*)\ 
v 'a—1 \ k + a— 1 / 

Setting 

(93) g(k) = (k + a- 2) 2 (0(z fe ) - 0*) + (a - 1)||% - x*|| 2 , 

a — 1 

we can reformulate as 

(94) Q(k + 1) + 2s ——jk (&(x k ) - 0(z*)) < Q(k) -\ - (k + a- l) 2 /g k ,x k+ i - x k + —— ~(x k - x*)\ 

a—1 a—1 \ k+a —1 / 

Equivalently 

q, _ Q Ik Oi — 1 \ 

(95) Q(k + 1) + 2s- -k (0(x k ) - 0(a;*)) < Q(k) + 2s (k + a - 1) ( g k , --— (x k+1 - x k ) + x k - x* ) . 

a—1 \ a — 1 / 

Using (1791) 

k + a — 1 

Z k -\-l — “ (X k -\-i x k ) + x kl 

a — 1 

Q(k + 1) + 2s— — jk (Q(x k ) - 0(x*)) < Q(k) + 2s (k + a - 1) (g k , z k+ 1 - x*) . 
a — 1 


we deduce that 
(96) 
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We now develop a similar analysis as in the continuous case. Given some integer K , set 

K 

£K{k ) = g(k) + 2s (j + a - 1) ( gj,Zj +1 - x*). 

i=k 

Then (l96l) is equivalent to 


(97) 


£x(k + 1 ) + 2 s-- \k ( 0 (x fc ) — 0 (x*)) < £k( k). 

a — 1 


Hence, the sequence ( £n{k )) is nonincreasing. In particular £x(k) < £k{ 0), which gives 

K K 

G(k) + 2s (•? + a - X ) ( 9j, Zj +1 - ®*) < 0 ( 0 ) + 2 s (j + a - 1 ) {gj,z j+ i - x*). 

j=k 1=0 

As a consequence 

fc-i 

(98) 0(fc) < 0(0) + ^ 2s (j + a- 1) ( gj,z j+ i - x*). 

1=0 

By definition of 0(fc), neglecting some positive terms, and by Cauchy-Schwarz inequality, we infer 

k -1 


Equivalently 

(99) 


(a - 1 )|| z k - x*\\ 2 < 0 ( 0 ) + 2 s^ (j + a- 1 ) || 3 jllll^+i - a*||. 

3=0 


\\zk-x*\\ 2 < -^—-g{0) + ^—J2(j + a-2)\\g j - 1 \\\\z j -x*\\. 

Oi — 1 rv — I z ' 


a — 1 


i=i 


We then use the following result, a discrete version of Gronwall’s lemma. 

Lemma 5.1. Let (ak) be a sequence of positive real numbers such that 

k 

a 2 k <c + ^2 Pjdj 

3 =i 

where ( / 3 j ) is a sequence of positive real numbers such that V . / 3 j < +oo, and c is a positive real number. Then 

OO 

ak < Vc+ y^/3j. 

j=i 

Proof. Set Afe := sup 1<J<fc a j. Then, for 1 < l < k 

l OO 

af < c + pjaj < c + A k ^ /3y 

l=i l=i 

Passing to the supremum with respect to l, with 1 < l < k, we obtain 

OO 

Afc < c + A k ^2/3j. 

l=i 

By elementary algebraic computation, it follows that 

OO 

A k < \fc + /3j. 

1=1 


□ 


Following the proof of Theorem \5. 1\ From (1991) . applying Lemma 15.II with a k = \\z k — x*||, we deduce that 

( 100 ) 


\zk~x* || < M := J + -L—f2ti + °‘- 1) llfflll- 
V a — 1 a — 1 


l=o 


Note that M is finite, because of the assumption XfceN^IIS'fcll < +°o. Returning to (155)) we obtain 


( 101 ) 0(fc) < C := 0(0) + 2 s U + « - 1) hi 

\1=o 


0 ( 0 ) 




a — i a — 


j =o 
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By definition of Q(k), and the positivity of its constitutive elements we finally obtain 

-^r (k + a- 2 f (<0(x fe ) - 0*) < C. 
a — 1 

which gives (|74|). □ 


Remark 5.1. In the particular case a = 3, for a perturbed version of the classical FISTA algorithm, Schmidt, Le Roux, 
and Bach proved in |40| a result similar to Theorem 15.11 concerning the fast convergence of the values. 

Let us now study the convergence of the sequence ( x k )■ 


Theorem 5.2. Let $ : H -> KU {+oo} be a convex lower semicontinuous proper function, and U/ : TL —> R a convex 
continuously differentiable function, whose gradient is L-Lipschitz continuous. Suppose that S = argmin(<f> + 4/) is 
nonempty. Suppose that a > 3, 0 < s < and < +oo. Let (x k ) be a sequence generated by the 

algorithm (AVD) a — algo. Then, 

i) J2 k + 4t)(xk) ~ inf($ + T)^ < +oo; 

ii) J 2 k W x k+i- Xk \\ 2 < +oo ; 

in) (Xk ) converges weakly, as k +oo, to some x* € argmind*. 


Proof. The demonstration is parallel to that of Theorem 13.11 
Step 1. Let us return to m, 

Q(k + 1 ) + 2 s—— jk ( 0 (x fe ) - 0 (x*)) < Q(k) + 2s (k + a - 1 ) (g k , z k + i - x*). 
a — 1 

By (HOOD , we know that the sequence (zk) is bounded. Summing the above inequalities, and using a > 3, we obtain 

( 102 ) k (($ + 4')(x fe ) - inf ($ + dt)) < +oo, 

k 


thats’ item i). 

Step 2. Now apply the fundamental inequality (1821) . which can be equivalently written as follows 

(103) @ k (y - sG s k (y)) + 7 r\\y - sG s , k (y) - x || 2 < Q k (x) + Z-\\x- y\\ 2 . 

is Zs 

Take y = y k , and x = x k . Since x k +i =y k - sG s ^ k (y k ), and y k - x k = k k ~ i _ 1 {x k - x fe _i), we obtain 

1 1 (k — l) 2 

(104) 0 fc (x fc+ i) + — ||x fe+ i - Xfe || 2 < 0 fc (x/c) + — ^ k + a _ ~ x k -i\\ 2 . 

Equivalently, by definition of Q k , 

1 1 (k — l ) 2 

(105) 0(x’fc + i) + — ||x fe+ i - x fe || 2 < 0(x fc ) + 2 ^ (j. + a _ H Xfc ~ x k-i\\ 2 + { 9 k,x k+ 1 - x k ). 

To shorten notations, set 9 k = Q(x k ) — 0(x*), d k = ^\\x k — Xfc_i|| 2 , a = a — 1. By Cauchy-Schwarz inequality, and 
with these notations, (11051) gives 

(106) i ^<4+1 - [fc + a) z rffc ) “ <K ° k ~ ® k+1 ^ + ll s,fc HH Xfe + 1 ~ ** 11 - 

After multiplication by (k + a) 2 , we obtain 

(107) - ((fc + a) 2 d k+ 1 - (fc - 1 ) 2 d k ) < (k + a) 2 (6 k - 9 k+ i) + (fc + a) 2 ||g fc ||||x fc+ i - x k \\. 
s 

Summing from k = 1 to k = K gives 

K K K 

(108) ^2 ((fc + a) 2 d k +1 - (fc - 1) 2 c4) < s^2(k + a) 2 (9 k - 9 k+1 ) + s^(fc + a) 2 ||g fc ||||xfc + i - x fc ||. 

k=1 k= 1 k= 1 

By a similar computation as in Chambolle-Dossal [251 Corollary 2], we equivalently obtain 


( 109 ) 


K 

-\- cl ) 2 djc-\-i~\~ ^ ^ a (2 k a — 2 ) d k 

k—2 


/ K K 

< s ( (a + l) 2 0i — (K + ci) 2 0k+ i + y: (2 k + 2 a — 1 ) 0k + - 


k —2 


k =1 


■a) 2 \\gk\\\\x k+ i 
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By (11021) we have (2 k + 2a — 1) 9 k < +oo. Hence there exists some constant C such that, for all K £ N 


K 


( 110 ) 


(I< + a) 2 \\x K +i - x K \\ 2 < C + 2 s^(A: + a) 2 || 5 fc||||a;fc + i - x k \\- 


fe=l 


We now proceed to a parallel argument to that used in the proof of Theorem 13.II Let us write (II101) as follows, with 
r k := (k + a)||x fc+ i - x k \\ 


( 111 ) 


r\ < C + 2sJ2ii + a)\\9j\\rj- 

3 = i 


We make appeal to the following discrete version of the Gronwall-Bellman lemma. 

Lemma 5.2. Let (r k ) be sequence of positive real numbers such that , for all k > 1 

k 

r 2 k <C + J2 Uj r 3 

7=1 

where C is a positive constant, and < +oo, with uij > 0. Then the sequence (r k ) is bounded with 

r k < VC + E Ulj ■ 

je N 

Proof. For simplicity, let us assume uij > 0 (one can always reduce to this situation by adding some positive constant, 
arbitrarily small, see Brezis 20 for the proof of this lemma in the continuous case). Set A k := C+JT =1 uJjfj, Aq = C. 
We have r k < A k , and A k+ \ — A k = uj k +ir k +i. Equivalently r k+ 1 = , which gives 

A k +i — A k 


and hence 


W/c+l 

A k +i A k 


< Va 


/c+l 5 


< Wfc+l- 


a/ A k +i \J A k +1 

From this, and using that the sequence (. A k ) is increasing, we deduce that 

yj A k+ 1 — i/Afc < Wfc+1- 

Summing this inequality, and using r k < \J~Af gives the claim. 


□ 


Following the proof of Theorem \5.2\ Let us apply lemma 15321 to inequality (11111) with rj = ( j + a)||xj-+i — Xj ||, and 
uij = (j + a)||< 7 j||. By using the assumption on the perturbation term fc||gfe|| < +oo, we deduce that 

( 112 ) supfc||a:fe + i - a: fc || < +oo. 

k 

Injecting this information in (11091) . we obtain 

(H3) E a (2 k + a — 2) d k < C + ^ (2k + 2a — 1) 9 k + sup((fc + a)||xfc+i — Xfc||) E(*+ a )ii»n- 

k k k k 

From a = a — 1 > 2, (11021) . and the definition of d k , we deduce that 

y k\\x k +i ~ x k \\ 2 < +oo, 

which is our claim ii). 

Step 3. The last step consists in applying Opial’s lemma, whose discrete version is stated below. 

Lemma 5.3. Let S be a non empty subset of LI, and ( x k ) a sequence of elements of LI. Assume that 
(■ i) for every z € S, lim \\x k — z|| exists ; 

fc—»■ + OO 

(ii) every weak sequential cluster point of the sequence (x k ) belongs to S. 

Then 

w — lim x k = Xoo exists, for some element x x € S. 

k->-\- oo 
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We are going to apply Opial’s lemma with S = argmin($ + '!'). By Theorem l5.ll we have ($-|-’I , )(xfc) —> min($ + 'I') 
(indeed, we have proved fast convergence). By the lower semicontinuity property of $ + 4* for the weak convergence of 
Ti, we immediately obtain that item (ii) of Opial’s lemma is satisfied. Thus the only point to verify is that lim ||x/c —x*|| 
exists for any x* £ argmin($ + d*). Equivalently, we are going to show that lim h k exists, with h k := \\\%k — cc*|| 2 - 
The beginning of the proof is similar to a, m- It consists in establishing a discrete version of the second-order 
differential inequality (15^1) 

h(t) + jh(t)<\\xm 2 + \\x(t)-x*\\\\g(t)\\. 

We use the parallelogram identity, which in an equivalent form can be written as follows: for any a,b,c £T~i 
(114) \\W-b\\ 2 + i||a — c || 2 = ^\\b - c || 2 + {a-b,a-c). 

Taking b = x*, a = Xk+i, c = Xk, we obtain 

^\\x k +i ~ x *\\ 2 + \\\ x k+i ~ x k\\ 2 = ^|| x k - x *|| 2 + (x k +i - x*,x k + i - x k ) ■ 


Equivalently, 

(115) h k - h k + 1 = ^Hzfc+i - x k \\ 2 + (x k +i -x*,Xk- x k +i) ■ 

By definition of yk we have 

k — l 

x k x k- 4-1 Vk x k -\-1 ' ~r{x k x k— l)- 

K + a — 1 

Replacing in (11151) . we obtain 


(116) h k - h k + i = -||x fc+ i - x k || 2 + ( x k +1 - x*,Vk ~ x k+i) ~ i—, -r (xk+i ~ x*,x k - x k -i ) • 

2 k + a — 1 

Let us now use the monotonicity property of <9<E>. Since —sV\I/(x*) £ sc>$(x*), and yk — Xk+i — sV4>(yfc) + sgk £ 
sd$(xk+i), we have 

(■ Vk - x k+i - sVT(?/ fc ) + sg k + sV4'(x*),x fe+ i - x*) > 0. 


Equivalently 

(:Vk - Xk+ijXk+i - x*) + s (V^x*) - VT(y fe ) + g k , x k +\ - x*) > 0. 
Replacing in (11161) we obtain 


1 k — 1 

(117) h k+ i - h k + -||x fe+ i - x k || 2 + s (V'L(yfc) - VT(x*) - g k ,x k +i - x*) - —-- {x k +i - x*,x k - x k -i) < 0. 

2 k + a — 1 

We now use the co-coercivity of VT 

(V$(y t ) - VT(x*), x fe+1 - x*) = (Vtf (l/fc) - V$(i*), x k+ i - y k ) + <W(y fe ) - V* (x*), y k - x*) 

> i||*(l Ik) ~ V ^(^)|| 2 + (V4-(y fc ) - V4<(x*),x fe+1 - y k ) 

(H 8 ) > \\\*{y k ) ~ V4 , (x *)|| 2 - ||W(y fc ) - V4'(x*)||||xfc +1 - y k \\ 

— —~^\\ x k+l ~ yk\\ ■ 

Combining (11171) and (11181) 

1 sL k — 1 

(119) h k+ i - h k + -||xfc+i - x fe || 2 - — ||x fc+ i - y k \\ 2 - s||gfc||||x fc+ i - x*|| - —- - (x k +i - x*,x k - x fc _i) < 0. 

2 2 k + a — 1 

Let us use again (I114D with b = x*, a = x k , c = x k -i■ We obtain 

2 ll^fc x ||~ T ~||xfc x k — 1 || — — 11 Xfc—x x || H- {xk x , x k x k —\) • 


Equivalently 

(120) hk —1 h k — — \ \x k Xk— 1 1| (x/c X , Xfc Xk— l) ■ 

Combining (11191) with (11201) we obtain 

^^ 1, s L 

(121) h k+ 1 - h k ~— - - ( h k - h k - i) < --||x fc+ i - Xfe|| 2 + — ||x fc+ i - y fe || 2 + s||p fc || ||x fe+ i - x*|| 

k + ol — 1 2 2 

k — \ f 1 \ 

+ -- -\\Xk - Xk-l \\ 2 + (Xk - Xk-l,Xk+l - Xk) . 

k + a — 1 V 2 J 
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By definition of y k = x k + - x k -i ), we have x k +i ~y k = Xfc+i - x k - ^2-1 ( x k ~ x k -i)- Hence 

/ k — 1 \ 2 k — 1 

||xfc+i - y k || 2 = ||xfe+i - Xfc|| 2 + —-- \\x k - cc fc _i|| 2 - 2 —-- {x k+ i - x k ,x k - x k -i) 

\k + a — 1 / k + a — i 

Substituting in (11211) . we obtain 

sL 

(122) h k+ 1 - h k - 7 fe (/i fc - < -(1 - — )||xfc+i - y k || 2 + s[|g fe ||||x fc+ i - x*|| + (j k + 7fc 2 ) ll^fc - Xfc-i|| 2 , 

where 7 k = k +a-i • Since 0 < s < L we have (1 — > 0. On the other hand, since 7*, < 1, we have 7 k + 7 k 2 < 2j k . 

Hence 

(123) /ifc+i - h k - 7fc (/ifc - hfc-i) < s||gfe||||xfc + i - x*|| + 27fc||x fe - Xfc_i|| 2 . 

By (HOOD , we know that the sequence (%) is bounded. By (11121) . we know that sup fe fc||xfe+i — x k \\ < +00 . Since 
x k = z k — kJ ffff[ l (x fc+ i — Xfc), we deduce that the sequence (x k ) is bounded. Returning to (11231) . we have, for some 
constant C 

(124) hk+i -h k - 7fe ( h k - /ife_i) < C-HsfeU + 27 fc ||x fe - x fc _i|| 2 . 

We now use the estimation that we obtained in step 2, namely 'f2 k k ||xfc+i — Xfc|| 2 < +00. Combined with the 
assumption 'f2 k k\\g k \\ < +00, we deduce that 

(125) h k+ 1 -h k - 7fc (h k - h k - 1) < w fc , 

for some nonnegative sequence (uj k ) such that EfceN < +00. Taking the positive part, we obtain 

(126) (h k +i - h k ) + - 7^ (h k - h k - 1) + < u k - 
We are now using the following lemma, which is a discrete version of lemma 13.21 


Lemma 5.4. Let ( a k ) be sequence of nonnegative real numbers such that, for all k > 1 

k- 1 

Qfc+l < 7—;- 

k + a — 1 

where a > 3, and J2 k koJ k < +00, with ui k > 0. Then the sequence (a k ) is summable, i.e., 


^2 a k < +00. 


fee n 


Proof. Since a > 3 we have a — 1 > 2, and hence 


k- 1 

Ofc+i < ye _|_ 2 a fc 


Multiplying this expression by (A: + l) 2 , we obtain 


Then note that, for all integer k 


,. .,0 ^ (fc — l)(fc + l ) 2 2 

(fc + 1) a k +1 < - ^ ^ + (^ + 1) w fc- 

(A - l)(fc + l ) 2 < 


fc + 2 

Hence 

(fc + l) 2 afc+i < k 2 a k + (k + l) 2 u<fc. 
Summing this inequality with respect to j = 1, 2,..., k , we obtain 

fc-i 

fc 2 etfc < ai + + 1) V 

i=i 

Dividing by fc 2 , and summing with respect to fc, we obtain 


fc-i 


E°! E 4 + E 4 Etf + x ) : 


fc 2 ^ fc 2 

fc fc J=1 


Dj. 


Applying Fubini theorem to this last sum, we obtain 


E a fc< a iE4 + E( E pj](-? + 1 ) 


k 2 


3 \fc=i+i 


k 2 


OO 


E 

k=j +1 




1 

j' 


We have 
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Hence 

1 (j + l) 2 

X a k < ai 2^ + X- T: - < +°°> 

k j J 

which by < 4 j for j > 1 gives the claim. 

End of the proof of Theorem \5.2l Let us apply lemma l5~H with ak = (hk — hk-i) + ■ We obtain 

Y {hk ~ h k - 1) + < +oo, 

k 

which, combined with hk nonnegative, gives the convergence of the sequence {hk), and ends the proof. 


□ 


□ 
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